DCT-based video features for audio-visual speech recognition
نویسندگان
چکیده
Encouraged by the good performance of the DCT in audiovisual speech recognition [1], we investigate how the selection of the DCT coefficients influences the recognition scores in a hybrid ANN/HMM audio-visual speech recognition system on a continuous word recognition task with a vocabulary of 30 numbers. Three sets of coefficients, based on the mean energy, the variance and the variance relative to the mean value, were chosen. The performance of these coefficients is evaluated in a video only and an audio-visual recognition scenario with varying Signal to Noise Ratios (SNR). The audio-visual tests are performed with 5 types of additional noise at 12 SNR values each. Furthermore the results of the DCT based recognition are compared to those obtained via chroma-keyed geometric lip features [2]. In order to achieve this comparison, a second audio-visual database without chroma-key has been recorded. This database has similar content but a different speaker.
منابع مشابه
Dct-based Video Features for Audio-v
Encouraged by the good performance of the DCT in audiovisual speech recognition [1], we investigate how the selection of the DCT coefficients influences the recognition scores in a hybridANN/HMM audio-visual speech recognition system on a continuous word recognition task with a vocabulary of 30 numbers. Three sets of coefficients, based on the mean energy, the variance and the variance relative...
متن کاملAudio-Visual Speech Recognition for Slavonic Languages (Czech and Russian)
The paper presents the results of recent experiments with audio-visual speech recognition for two popular Slavonic languages: Russian and Czech. The description of test applied tasks, the process of multimodal databases collection and data pre-processing, methods for visual features extraction (geometric shape-based features; DCT and PCA pixel-based visual parameterization) as well as models of...
متن کاملA Comparison of Visual Features for Audio-Visual Automatic Speech Recognition
The use of visual information from speaker’s mouth region has shown to improve the performance of Automatic Speech Recognition (ASR) systems. This is particularly useful in presence of noise, which even in moderate form severely degrades the speech recognition performance of systems using only audio information. Various sets of features extracted from speaker’s mouth region have been used to im...
متن کاملLarge-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop
We report a summary of the Johns Hopkins Summer 2000 Workshop on audio-visual automatic speech recognition (ASR) in the large-vocabulary, continuous speech domain. Two problems of audio-visual ASR were mainly addressed: Visual feature extraction and audio-visual information fusion. First, image transform and model-based visual features were considered, obtained by means of the discrete cosine t...
متن کاملA system for audio-visual speech recognition
In this work, a system of audio visual speech recognition will be presented. A new hybrid visual feature combination, which is suitable for audio -visual speech recognition was implemented. The features comprise both the shape and the appearance of lips, the dimensional reduction is applied using discrete cosine transform (DCT). A large visual speech database of the German language has been ass...
متن کامل